We use statistics to confirm effects, estimate parameters, and predict outcomes
It usually rains when I’m in Cape Town, but mostly on Sunday
Confirmation: In Cape Town, it rains more on Sundays than other days
Estimation: In Cape Town, the odds of rain on Sunday are 1.6–2.2 times higher than on other days
Prediction: I am confident that it will rain at least one Sunday the next time I go
[c]
How we interpret data like this necessarily depends on assumptions:
Is it likely our observations occured by chance?
Is it likely they didn’t?
image
Tessa Wessels, Faces on a Train
We measure the average heights of children raised with and without vitamin A supplements
Estimate: how much taller (or shorter) are the treated children on average?
Confirmation: are we sure that the supplements are helping (or hurting)?
Range of estimates: how much do we think the supplement is helping?
We use P values to say how sure we are that we have seen some effect
We use confidence intervals to say what we think is going on (with a certain level of confidence)
P values are over-rated
Never use a high P value as evidence for anything, e.g.:
that an effect is small
that two quantities are similar
We want to know if vitamin A supplements improve the health of village children
Is height is a good measure of general health?
How will we know height differences are due to our treatment?
We want the two groups to start from the same point – independent randomization of each individual
We may measure changes in height
Or control for other factors
Is vitamin A good for these children?
How sure are we?
How good do we think it is?
How sure are we about that?
What does it mean if I find a “significant P value” for some effect in this experiment?
The difference is unlikely to be due to chance
If I’m certain that the true answer isn’t exactly zero, why do I want the P value anyway?
[c]
image
What do these results mean?
Which are significant?
[c]
image
A high P value means we can’t see the sign of the effect clearly
A low P value means we can
image
image
More broadly, a P value measures whether we are seeing something clearly
Type I (False positive:) concluding there is an effect when there isn’t one
Type II (False negative:) concluding there is no effect when there really is
Type I (False positive:) in the hypothetical case that the effect is exactly zero, what is the probability of falsely finding an effect
Type II (False negative:) what is the probability of failing to find an effect that is there?
These are useful to analyze power and validity of a statistical design
[c]
Sign error: if I think an effect is positive, when it’s really negative (or vice versa)
Magnitude error: if I think an effect is small, when it’s really large (or vice versa)
Confidence intervals clarify all of this
image
[c]
If I have a low P value I can see something clearly
But it’s usually better to focus on what I see than the P value
image
[c]
If I have a high P value, there is something I don’t see clearly
It may be because this effect is small
High P values should not be used to advance your conclusion
image
Small differences
Less data
More noise
An inappropriate model
Less model resolution
A lower P value means that your evidence for difference is better
A higher P value means that your evidence for similarity is better – or worse!
[c]
image
[c]
image
Never say: A is significant and B isn’t, so A > B
Instead: Construct a statistic for the hypothesis A > B
[c]
All men are mortal
Jacob Zuma is mortal
Therefore, Jacob Zuma is a man
image
[c]
All men are mortal
Fanny the elephant is mortal
Therefore, Fanny is a man
image
A lot of statistical practice works this way:
This sort of statistical practice leads in the aggregate to bad science
The logic can be fixed:
We can’t build statistical confidence that something is small by failing to see it clearly
We must instead see clearly that it is small
This means we need a standard for what we mean by small
image
image
People who work in respiratory clinics sometimes have to wear bulky, uncomfortable, expensive masks
They would like to switch to simpler masks, if those will do the job
How can this be tested statistically? We don’t want the masks to be “different”.
Use a confidence interval
Decide how big a level is acceptable, and construct a P value for the hypothesis that this level is excluded!
image
[c]
image
Is the new mask “good enough”?
What’s our standard for that?
[c]
image
We can even attach a P value by basing it on the “right” statistic.
The right statistic is the thing whose sign we want to know:
Make a null model
Test whether the effect you see could be due to chance
Test whether the effect you see or a larger effect could be due to chance
image
image
image
image
image
image
[c]
Make a complete model world
Use conditional probability to calculate the probability you want
image
[c]
More assumptions $\implies$ more power
With great power comes great responsibility
image
We want to go from a statistical model of how our data are generated, to a probability model of parameter values
Requires prior distributions describing the assumed likelihood of parameters before these observations are made
Use Bayes theorem to calculate posterior distribution – likelihood after taking data into account
[c]
A frequentist can do a clear analysis right away
A Bayesian needs a ton of assumptions – will try to make “uninformative” assumptions
image
[c]
Frequentist: how unlikely is the observation, from a random perspective?
Bayesian: what’s my model world? What is my prior belief about weather-weekday interactions.
image
Tessa Wessels, Faces on a Train
Statistics are not a magic machine that gives you the right answer
If you are to be a serious scientist in a noisy world, you should have your own philosophy of statistics
Be pragmatic: your goal is to do science, not get caught by theoretical considerations
Be honest: it’s harder than it sounds.
You can always keep analyzing until you find a “significant” result
You may also keep analyzing until you find a result that you already “know” is true.
Good practice
Keep a data-analysis journal
Start before you look at the data